graph TD
AI["Artificial Intelligence"] --> ML["Machine Learning"]
ML --> DS["Data Science"]
DS --> TS["Time Series"]
2024-10-10
This presentation offers a friendly introduction to Time Series, designed for everyone at ProCogia. We’ll begin with a brief introduction to Dave, who is leading our Data Science platform strategy. From there, we’ll explore what sets time series data apart from other types of data, dive into analysis and forecasting using an R framework, and showcase engaging examples of time series from current events. Finally, we’ll conclude with a discussion on Quarto, the technical communication tool used to create this presentation.
Please note: This presentation includes data from the Gun Violence Archive, and some discussion around shooting fatalities which some may find upsetting.
Data Science
Extracting insights from data using analysis and machine learning.
DataOps
Streamlining data management and processes for reliable data operations.
DevOps
Integrating development and operations for faster, automated deployments.
Data Science
DataOps
DevOps
★
Data Science
DataOps
DevOps
★
★
Platform
Installs
Pipelines
BI/
Analytics
Intent: Move from 2-discipline overlaps into the sweetspot.
tidyverts a ts framework for Rgraph TD
AI["Artificial Intelligence"] --> ML["Machine Learning"]
ML --> DS["Data Science"]
DS --> TS["Time Series"]

tidyverts
library(fpp3) # tidyverts
# Example: Monthly Air Passenger Data
air_passengers <- as_tsibble(AirPassengers)
# Simple Forecasting Model
fit <- air_passengers |>
model(ETS(value ~ error() + trend() + season()))
# Forecast the next 12 periods
fc <- fit %>% forecast(h = 12)
# Plot the time series with forecast
autoplot(air_passengers, value) +
autolayer(fc) +
labs(title = "Air Passenger Forecast",
y = "Passengers",
x = "Year")
Courtesy of Gun Violence Archive
This is an initial look at the Malcolm Gladwell assertion from Revisionist History podcast Guns Part 4: Moral Hazard. The assertion is that gun homicide statistics, which are decreasing, are masking a rise in total gun violence incidents because non-fatal shooting are not counted in the data.
Note: the data from the Gun Violence Archive includes both fatality and injury-only incidents and is compiled from media reports. Incidents that were reported to the police, but not found in public reports is not included.


| Incident ID | Date | State | City | severity | fatality |
|---|---|---|---|---|---|
| 2726996 | 2023-10-10 | California | Los Angeles | Victim Injury | FALSE |
| 2726951 | 2023-10-10 | California | Los Angeles | Victim Injury | FALSE |
| 2727033 | 2023-10-09 | California | Rancho Cordova | Victim Fatality | TRUE |
| 2726961 | 2023-10-09 | California | Bakersfield | Victim Injury | FALSE |
| 2726332 | 2023-10-09 | California | Oakland | Victim Injury | FALSE |
| 2726327 | 2023-10-09 | California | Oakland | Victim Injury | FALSE |
| 2726295 | 2023-10-09 | California | West Hollywood | Victim Injury | FALSE |
| 2726301 | 2023-10-09 | California | Los Angeles | Victim Injury | FALSE |
| 2726921 | 2023-10-09 | California | San Francisco | Victim Fatality | TRUE |
| 2726173 | 2023-10-08 | California | Los Angeles | Victim Injury | FALSE |
| 2726359 | 2023-10-08 | California | Stockton | Victim Fatality | TRUE |
| 2727111 | 2023-10-08 | California | Santa Ana | Victim Fatality | TRUE |
| 2726310 | 2023-10-08 | California | West Covina | Victim Fatality | TRUE |
| 2726362 | 2023-10-08 | California | Sacramento | Victim Fatality | TRUE |
| 2726291 | 2023-10-08 | California | Fresno | Victim Injury | FALSE |
| 2725386 | 2023-10-07 | California | Long Beach | Victim Injury | FALSE |
| 2725869 | 2023-10-07 | California | Oakland | Victim Injury | FALSE |
| 2725880 | 2023-10-07 | California | Oakland | Victim Injury | FALSE |
| 2725110 | 2023-10-07 | California | Canoga Park | Suspect Fatality | TRUE |
| 2725878 | 2023-10-07 | California | Oakland | Victim Injury | FALSE |
| 2725876 | 2023-10-07 | California | Oakland | Victim Injury | FALSE |
| 2726376 | 2023-10-07 | California | San Francisco | Victim Fatality | TRUE |
| 2725669 | 2023-10-07 | California | Inglewood | Victim Fatality | TRUE |
| 2725113 | 2023-10-07 | California | Los Angeles | Victim Injury | FALSE |
| 2725117 | 2023-10-06 | California | Gardena | Victim Injury | FALSE |
| 2724595 | 2023-10-06 | California | Los Angeles | Victim Fatality | TRUE |
| 2724655 | 2023-10-06 | California | Santa Barbara | Victim Injury | FALSE |
| 2725119 | 2023-10-06 | California | Richmond | Victim Fatality | TRUE |
| 2725115 | 2023-10-06 | California | Los Angeles | Victim Injury | FALSE |
| 2724590 | 2023-10-05 | California | Downey | Victim Injury | FALSE |
| 2724584 | 2023-10-05 | California | Long Beach | Victim Injury | FALSE |
| 2724524 | 2023-10-05 | California | Lodi | Victim Injury | FALSE |
| 2723769 | 2023-10-05 | California | Fresno | Suspect Injury | FALSE |
| 2723728 | 2023-10-05 | California | Rancho Murieta | Victim Injury | FALSE |
| 2723457 | 2023-10-04 | California | Los Angeles | Victim Injury | FALSE |
| 2723347 | 2023-10-04 | California | Los Angeles | Victim Fatality | TRUE |
| 2722903 | 2023-10-04 | California | Stockton | Victim Injury | FALSE |
| 2722741 | 2023-10-04 | California | Los Angeles | Victim Injury | FALSE |
| 2723427 | 2023-10-04 | California | Long Beach | Victim Fatality | TRUE |
| 2721653 | 2023-10-03 | California | Los Angeles | Victim Injury | FALSE |
| 2722524 | 2023-10-03 | California | Anaheim | Suspect Fatality | TRUE |
| 2722173 | 2023-10-03 | California | Lodi | Victim Fatality | TRUE |
| 2722899 | 2023-10-03 | California | Stockton | Victim Injury | FALSE |
| 2721835 | 2023-10-03 | California | Sacramento | Victim Injury | FALSE |
| 2722511 | 2023-10-03 | California | Los Angeles | Victim Injury | FALSE |
| 2722731 | 2023-10-03 | California | Stockton | Victim Injury | FALSE |
| 2721666 | 2023-10-03 | California | Lancaster | Suspect Injury | FALSE |
| 2721380 | 2023-10-02 | California | Los Angeles | Victim Injury | FALSE |
| 2723912 | 2023-10-02 | California | North Hills | Victim Injury | FALSE |
| 2721677 | 2023-10-02 | California | Atwater | Victim Fatality | TRUE |
gva_ts_yr <- gva3state |>
filter(Date < ymd('2023-10-01'),
Date >= ymd('2020-10-01')) |>
count(Date, State, fatality, name = 'Incidents') |>
as_tsibble(index = Date,
key = c(State, fatality)) |>
fill_gaps(Incidents = 0L)
gva_ts_yr_plt <- gva_ts_yr |>
model(stl = STL(Incidents)) |>
components() |>
autoplot()
gva_ts_yr_pltgva_3city <- gva3state |>
filter(Date < ymd('2023-10-01'),
Date >= ymd('2020-10-01'),
City %in% c('Chicago', 'Los Angeles', 'Indianapolis')) |>
count(Date, City, name = 'Incidents') |>
as_tsibble(index = Date,
key = c(City)) |>
fill_gaps(Incidents = 0L)
gva_3city_train <- gva_3city |> filter(Date < ymd('2023-09-01'))
gva_3city_test <- gva_3city |> filter(Date >= ymd('2023-09-01'))
gva_3city_fit <- gva_3city_train |>
model(ETS = ETS(Incidents))
# Forecast the next 12 periods
gva_3city_fc <- gva_3city_fit %>% forecast(h = 30)
# Plot the time series with forecast
gva_3city_train |>
filter(Date >= ymd('2023-07-01')) |>
autoplot(Incidents) +
autolayer(gva_3city_fc, Incidents, alpha = 0.5) +
autolayer(gva_3city_test, Incidents, alpha = 0.3)+
labs(title = "Metropolis GVA September Incidents Forecast") +
facet_wrap(. ~ City, ncol = 1, scales = "free_y")
chi_fit <- gva_chi_train |>
model(ETS(Incidents))
# Forecast the next 12 periods
chi_fc <- chi_fit %>% forecast(h = 30)
# Plot the time series with forecast
gva_chi_train |>
filter(year(Date) == 2023) |>
autoplot(Incidents) +
autolayer(chi_fc, Incidents, alpha = 0.5) +
autolayer(gva_chi_test, Incidents, alpha = 0.3)+
labs(title = "Chicago GVA September Incidents Forecast")
ARIMA(y ~ pdq() + PDQ() )
AR(p) where:
\[ X_t = c + \phi_1 X_{t-1} + \phi_2 X_{t-2} + \dots + \phi_p X_{t-p} + \epsilon_t \]
MA(q)
\[ X_t = \mu + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \dots + \theta_q \epsilon_{t-q} \]
d - is the number of differences
P,D,Q - are the seasonal terms
graph TD
classDef boxStyle padding:30px;
Step1["Plot the data. Understand the Patterns"] --> Step2["Stablize Variables (Box-Cox)"]
Step2 --> Choose["Select Method"]
Choose --> |Auto| StepA1["Use ARIMA()"]
Choose --> |Manual| StepB1["Difference to make series stationary"]
StepB1 --> StepB2["Plot ACF/PACF"]
StepB2 --> StepB3["Use ARIMA(var ~ pdq() + PDQ())"]
StepB3 --> Residuals["Check Residuals and ACF"]
StepA1 --> Residuals
Residuals --> |No| StepB2
Residuals --> |Yes| Forecast["Calculate Forcast"]
style Choose fill:#9f9,stroke:#333,stroke-width:2px,rx:50,ry:50;
style Residuals fill:#9f9,stroke:#333,stroke-width:2px,rx:50,ry:50;
Constant variance, but trend and seasonality are present.
Seasonality is removed, but trend exists.

Looks like whith noise. Let’s do a statistical test.
\(H_0\) is that series is stationary:
chi_fit <- gva_chi_train |>
model(auto_arima = ARIMA(Incidents),
#auto_arima2 = ARIMA(Incidents, stepwise = FALSE, approx = FALSE),
auto_dfix = ARIMA(Incidents ~ 0 + pdq(1,1,1) + PDQ(0,1,2)),
arima_ar = ARIMA(Incidents ~ 0 + pdq(2,0,0) + PDQ(4,1,0)),
arima_ma = ARIMA(Incidents ~ 0 + pdq(0,0,2) + PDQ(0,1,1)),
ets = ETS(Incidents),
snaive = SNAIVE(Incidents))
# Forecast the next 12 periods
chi_fc <- chi_fit %>% forecast(h = 30)
chi_fit |> pivot_longer(everything())# A mable: 6 x 2
# Key: name [6]
name value
<chr> <model>
1 auto_arima <ARIMA(1,1,1)(0,0,2)[7]>
2 auto_dfix <ARIMA(1,1,1)(0,1,2)[7]>
3 arima_ar <ARIMA(2,0,0)(4,1,0)[7]>
4 arima_ma <ARIMA(0,0,2)(0,1,1)[7]>
5 ets <ETS(A,N,A)>
6 snaive <SNAIVE>

| .model | RMSE |
|---|---|
| ets | 2.446857 |
| auto_dfix | 2.448216 |
| arima_ma | 2.763591 |
| auto_arima | 2.857696 |
| arima_ar | 3.010882 |
| snaive | 4.151305 |
fable::prophet()nnetar, xgboostgithub.iotidyverts?